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ABSTRACT 

p. o. Johnson and J. Neyman (1936) proposed a general 
linear hypothesis testing procedure for testing the null hypothesis 
of no treatment difference in the presence of some covariates. This 
is generally known as the Johnson-Neyman (JN) technique. The need for 
the hypothesis testing step (often omitted) as originally presented 
and the appropriateness of making simultaneous inferences after the 
slope homogeneity assumption test were investigated. Three regression 
settings were used to simulate the conditions of slight, moderate, 
and severe slope heterogeneity. Within each setting, 3 sample size 
ratios (10:10, 20:20, and 30:30, respectively) were considered with 
10,000 simulated experiments in each sample size ratio. Within 9 
artificially generated data conditions, the total number of simulated 
experiments was 90,000. Simulation results indicate that the 
hypothesis testing procedure as originally presented was unnecessary, 
whereas the slope homogeneity test was important for making 
simultaneous inference- When the slope homogeneity test was rejected, 
the simultaneous error rate was found to approximate the nominal 
alpha level as set forth prior to conducting the researcn. A caution 
is issued against applying the JN technique when sample sizes are 
small. Seven tables present analysis results, and there is a 
nine-item list of references. (SLD) 
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ABSTRACT 



This paper reviewed a long-forgotten aspect of the Johnson-Neyman (J- 
N) technique: hypothesis testing. The originally proposed J-N technique 
vas a two-step procedure: (1) hypothesis testing — test whether there is a 
treatment effect somewhere in the entire covariate score range ; if the 
answer is yes ,ve then proceed to (2) find the region of siguif icance. 
Instead of performing the first step, educational researchers usually do a 
test of the slope homogeneity assumption. If the slope homogeneity 
assumption is rejected, the region of significance is then computed. It 
has been shown by some researchers that the region of significance as 
derived by Johnson and Neyman vas oon-simu itaneons. Nevertheless , 
educational researchers typically make simultaneous inferences based on the 
computed region of significance. The purpose of this study vas to 
investigate the need for the hypothesis testing step as originally 
presented in Johnson and Neyman 's paper, and the appropriateness of making 
simultaneous inference after the slope homogeneity assumption test-. 

Three regression settings vere employed to simulate the conditions of 
slight, moderate, and severe extent of slope heterogeneity. Vithin each 
setting, three sample size ratios were considered (10:10, 20:20, and 30:30) 
with 10,000 simulated experiments in each sample size ratio. Within nine 
artificially generated data conditions, the total number of simulated 
experiments in this study vas 90,000. The simulation results indicated 
that the hypothesis testing procedure as originally nresented was 
unnecessary, whereas the slope homogeneity test commonly performed before 
the application of the J-N technique vas important for making simultaneous 
inference. Vhen the slope homogeneity test vas rejected, the simultaneous 
error rate vas found to approximate the nominal alpha level as set forth by 
the researcher prior to conducting the research. Nevertheless, a caution 
vas issued against the application of the J-N technique vhen sample sizes 
are smal 1. 
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INTRODUCTION 

Johnson and Neyman (1936) proposed a general linear hypothesis 
testing procedure for testing the null hypothesis of no treatment 
difference in the presence of some covariates. This is generally known as 
the Johnson-Neymcn (J-N) technique of which ANCOVA and gain score analysis 
are only two special cases. 

The original J-N technique was presented in the context of two 
treatment gronps (of sizes n ? and n 2 , respectively) and two covariates. For 
the sake of simplicity and without any loss in general izabi 1 i ty , only one 
cova:-iate is considered in this paper. Let Y be the criterion variable, X 
be the covariate. Johnson and Neyman expressed the expected value of the 
criterion variable as a function of X, i.e. , 



B(Y) = F X (I) = a^ + a,X for group 1, and 
E(Y) = F 2 (X) = b 0 + b,X for group 2. 



[1] 

Linear Hypothesis 

The hypothesis, H(X) , tested was that two treatment gronp means are 
equal. Onlike the two-group t-test, the hypothesis posted here takes into 
consideration the concommi tant variable system. It should also be noted 
that the H(X) was not intended for any particular system of fixed values; 
say comparing the treatment group mean difference only at X = Xj or X - X 2 . 
Instead, the research question of interest was "Are there a system of 
values of X for which the hypothesis H(X) should be rejected?" Therefore 
the null hypothesis tested by the J-N technique was "There is no system of 
values of X at which two treatment means are different (see Johnson k Fay, 
1950, p. 351)." This null hypothesis was expressed in the following linear 
form, 

H(X) : a„ - b 0 + (a, - bj)X = 0. [2] 
Tes t Criterion 

The test statistic for the above hypothesis involved the computation 
of the likelihood criterion L (Johnson and Neyman, 1936, p. 77), 

SSE. 



L = 



SSb!' [3] 



where SSE a is the absolute minimum error sum of squares from fitting all 
four regression parameters as in expression [l] for two gronps separately 
(SSE a is the sum of two error sum of squares for two gronps), and SSB r is 
the relative error sum of sqnares from imposing the restrictions as set 
forth in the null hypothesis. Let Y {j denote the outcome score for tth 
individual in group j, and n^ denote the number of observations in group j. 
The SSE. term is obtained as 



SSE a 



- E (Y n - a, - ai x t .) 3 



i=l 



B 2 



+ E (Y,- 2 - b 0 - bjX-) 3 , [ 4 ] 
i = 1 



and 
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Since SSE 4 cannot be greater than SSE r , L is bounded by 0 and 1, The 
smaller L, the less likely the null hypothesis is to be true. The 
distribution of L assumes the form of a Beta probabi 1 i ty distribution vith 
two parameters as p = ^(n t + n 2 - s) and q = Jr. The value of s is the 
number of independent parameters of which the population mean is assumed to 
be a function with known coefficients (s = 4 as in expression [l]) and r is 
the number of equations required to express the hypothesis tested (r = 2 
for aQ = b 0 , a! = bj). A table of the values of L at various significance 
levels can be found in Tables of the Incomplete Beta r unction (Pearson, 
1956), Johnson and Neyman (1936) also presented a simplified Incomplete 
Beta Function table for significance levels .01 and .05, at some values of 
p and q. 

Relationship Between L and Snedecor's F Distribution 

It is seen that L is a ratio of one sum of squares to two sums of 
squares, expressed in terms of 

1 = 2 ^ 2 » W 

r a + x i 

2 

where x i is the ^extra" component due to chance fluctuations with degrees 
of freedom of r. The x a bas ( D l + n 2 - s) degrees of freedom. Bickel and 
Doksum (1977, p. 13-17) discussed the relationship between the Incomplete 
Beta Function and the Snedecor's F distribution 

f k r)=rTx<5). m 

where v is the degrees of freedom associated with SSE^. Taking the inverse 
of the value obtained in equation [7], it becomes the familiar central F 
distribution with degrees of freedom r and v. Therefore, the p- value of 
the test statistic in [3] can also be obtained from an F distribution as 
obtained fro© [7]. 

An Easy Vay to Obtain F, v 
(r,v) 

Define a dummy variable T such that T f ~ 1 when the observation is 
from group one, and T f - = 0 otherwise. Combining the two regression lines in 
expression [l], we may reparameteritze the model as 

B(V) ^ fi Q ♦ 0jT i^Ii ^jTX, [8] 

where fi x = - b Q ) and £j = (a k - bj). The null hypothesis H(X) can be 
expressed as 

H(X): 0j = 0, ^ = 0. 

The linear model under nail hypothesis is 

B(Y) = 0 O + fi 2 X [g] 
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The F value in expression [7] can simply be obtained as follows: 
P _ (SSE 9 - SSBs)/ r 

( r > v ) ' SSE g /(N - s) 

where SSE g is the error snm of sqnares associated with expression [9], SSE D 
is the error snm of sqnares associated with expression [8], N = a, + L r 
and s are defined as before. °1 + n 2» r 

Role of Hypothesis Testing in Johnson-Neyman Technique 

The first step of the originally proposed J-N technique was to 
perfo-m the omnibus hypothesis H(X) testing P r OC ednre as described above. 
If H(X) ,s rejected, ve conclnde that the treatment difference exists over 
a set of the covariate points, which have been referred to as the "region 
of significance", denoted by R. The computation of the region of 

sszetss" s ^ii'tT^ii mi cm i h " th ^ books {na \ tem *> i98o; 

' ve Tail to reject H(X), the problem is complete at this 

point, and no attempt shonld be made to compnte the region of significance. 

Non-Si mul tane ons V e rsns Simultaneous Inferences 

Hsing the scheme of comparing two regression lines, Potthoff f 1964 V 
a»d Rogosa (1980) rightfnily pointed out that the derived region of * ' 
signif.cance as originally proposed by Johnson and Neyman (1936) is non- 
simnltaneous. The treatment difference can be validly inferred only for 
any ^glfi covariate point over the region of significance, not for all 
points over R simultaneously. For most edncational researchers however 
the pnrpose of using the J-N technique is to find a set of the X values' 
such that one may claim that the treatment difference exists for aU X 
points over R at a prespecified o level. Furthermore, the covariates used 
m educational and psychological research are mostly random, non- 
simultaneous inference is seldom meaningful. Since the exact error 
probalnHty of making a simultaneous i-jTereace error based on the non- 
simultaneous region of significance when the covariate is random can not be 
theoretically derived, the extent of the inappropriateness of making such 
simultaneous inferences is unknown. 

THE CURRENT STUDY 

Potthoff (1964) applied the Scheffe-Iike procedure to the original J- 
N technique to derive a simultaneous region of significance. Nevertheless 
this procedure has rarely been adopted by educational researchers, perhaps' 
due to its infereior statistical power (Chon and Hnberty, 1992). 
Educational researchers often make simultaneous inferences based on R as 
yielded by the original J-N techuique. Hence, the purpose of this paper is 
to examine the empirical performance of the J-N technique with respect to 
the appropriateness of making simultaneous inferences under various 
simulated data settings. 

Two types of simultaneous error are conceivable: (l) detecting a 
region of significance when in fact there is none; and (2) the region of 
significance contains the point for which two populations are equal in 
expected criterion score. The former type of error was investigated by 
fields (1978). The rate of this type of error associated with the J-N 
technique was found to be approximately . 15 at a nominal a of .05 in a 
complete null data condition (two population regression lines are 
identical). The latter type of error under heterogeneous population slopes 
condition was explored by Chou and Huberty (1992). It vas surprisingly 
found that the rate of this type of error associated with the original J-N 
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technique was approximately at the nominal a level. Based on these 
empirical results, it appears that the original J-N technique can be nsed 
to make simultaneous inferences provided that the error rate of the first 
type can be controlled. The overall null hypothesis test of no region of 
significance as presented in the original paper of Johnson and Neyman 
(1936) might be needed to serve this purpose. 

Vithont performing the first step (omnibus hypothesis testing 
procedure), educational researchers typically go straight to the 
computation of It Rogosa (1980) has shovn through some algebraic 
manipulations that R is composed of all covariate points (denote the 
covariate by X) which satisfy the second-degree inequality of the form, AX 2 
+ 2BX + C > 0. The left hand side of the inequality takes on the form of a 
parabola. When setting it to 0, we get two bounds for R, denoted as X 
(for the smaller root) and X+ (for the larger root). It is important to 
note that the sign of A determines the form of R in terms of X and X + . 
The inequality has ao real solutions when B 2 - AC is negative. " A common 
practice of using the J-N technique is when the test of slopes homogeneity 
is rejected. It can be shovn that the sign of A is positive when the slope 
homogeneity is rejected. Dnder such a situation the parabola opens upward , 
and consequently the region of significance will always exist. Therefore, 
the J-N technique has been presented as to yield a region of noon 
si gn i f i cance between X_ and X+ (Huitema, 1980; Pedhazur, 1982). According 
the empirical results due to Chou and Huberty (1992), it appears that 
simultaneous inferences may be appropriate for R, compnted after the 
rejection of the slope homogeneity assumption. However, a region of 
significance also exists when two regression lines are parallel and non~ 
null (i.e. , equal slopes, different intercepts). Rogosa (1980) argued that 
the J-N technique could be used regardless of the assumption of slope 
homogeneity being rejected or not. When the test of slope homogeneity is 
not rejected (A is negative), the parabola opens downward. Consequently, R 
is composed of the X points between X and X + , This paper is to examine 
the appropriateness of making simultaneous inferences for R or R' under two 
situations: (1) when the test of the slope homogeneity assumption is 
rejected, and (2) when the test of the slope homogeneity assumption is not 
rejected. The necessity of the omnibus null hypothesis testing step for 
the original J-N technique is examined under each situation. 

MONTE CARLO SIMULATION PROCEDURES 

Three settings of regression coefficients were selected in computer 
Monte Carlo simulations. They are shov^ in Table 1. The variances of the 
covariate and the random error component were set as 9 and 36, 
respectively. The outcome variable under investigation was the 
simultaneous error of the second type. In each setting, the regression 
parameters were determined snch that the two population regression lines 
intersect in the middle of the covariate data range (grand covariate mean 
X=20). The three settings differ in the extent of slopes heterogeneity 
(slight, moderate, and severe in the author's arbitrary judgement). Within 
each setting, three sample size combinations were considered (10:10, 20:20, 
and 30:30) with 10,000 simulated experiments for each sample size 
combination. The J-N technique was applied in each simulated experiment. 
With nine artificially generated data conditions (three regression settings 
and three sample size combinations), the total number of simulated 
experiments in this study was 90,000. 
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SIMULATION RESULTS 

The performance of the omnibus test in the settings of slight, 
moderate, and severe heterogeneous slopes was shovn in Table 2 through 
Table 4 nnder the "Omnibns Test" heading. The total number of rejection of 
the omnibus hypothesis, the number of incorrect R's computed following the 
rejection of the omnibus hypothesis, and the percent of the incorrect R's 
were reported under this heading. The omnibus test obviously failed to 
control the proportion of the incorrect R's at the nominal alpha level in 
the slight heterogeneous slopes setting. As the severity of slope 
heterogeneity increased, the total percentage of incorrect R's approached 
the nominal value of .05 (see Table 3 and Table 4). There vas a tendency 
of the error rate dropping as the sample size increased. The X_ and 
reported were the two average lower and upper boundaries of the non- 
significant region. Note that X_ and X + were not reported for A < O 
because the computed R would make little sense nnder the current simulated 
settings. 

Table 2 through Table 4 also showed that the computation of R for A < 
0 after a significant omnibus null hypothesis test produced incredibly high 
simultaneous error rates. On the other hand, for A > 0, the empirical 
simultaneous error rates got quite close to the nominal o (the largest 
error rate vas .086, found in Table 2 at sample size 10:10). 

Table 5 though Table 7 reported the simulation results without 
performing an omnibus test. The importance of the slope homogeneity test 
vas strongly revealed. For A < 0, simultaneous error rates were 
unacceptbly high, whereas for A > 0, the simultaneous error rates generally 
were controlled at the nominal alpha level. For A > 0, the simultaneous 
error rate was unacceptably large only vhen sample sizes were small, found 
in Table 5 at sample size 10:10. 

DISCUSSION 

The omnibus hypothesis testing procedure as proposed in the original 
paper of Johnson and Neyraan (1936) was unable to control the simultaneous 
error rate at the nominal a level. The first step of the J-N technique 
appeared to be unnecessary. Because the covariates used in most 
educational studies are often random, making non-simultaneous inference at 
any single covariate point in R is seldomly useful. Fortunately, the 
original J~N technique appeared to be still be valid for making 
simultaneous inferences over the entire range of R, given that the test of 
slope homogeneity assumption is rejected. A warning should be made against 
the use of the J-N technique vhen the test of slope homogeneity is not 
rejected. In the light of the results from this study, the common practice 
for using the J-N technique only when the test of slope homogeneity 
assumption is rejected is still recommended. Another finding of this study 
worth attention is that sufficient sample sizes must be obtained for the 
application of the J-N technique. In the settings of current study, sample 
sizes larger than 20:20 were required to secure the validity of the 
simultaneous inference. 

Nevertheless, the ability of the original J-N technique to control 
the simultaneous error rate at the nominal level should not be overstated. 

4 In comparing two regression lines, a region of significance will always 

exist if two lines are not identical and sample sizes are sufficiently 

- large. In fact, when the two different population regression lines are 

compared, the two population means are equal only at the covariate point 
where the two lines intersect. Some of the differences between two 



ERIC 



8 



population regression lines may be trivial to be declared as practically 
significant, Cohen (1988) stressed the need for an awareness of the effect 
magnitude over and above which the researcher may wish to conclnde that the 
difference between two population means are practically significant. With 
this idea in mind, fntnre nsers of the J-N technique may want to find a 
region of significance which wonld allow a researcher-determined magnitude 
of significant difference. This may warrant fnrther stndies. 
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Table 1 

Regression Coefficients in the Three Simulation Settings 
Values of (a., a,) for Group 1 and (b,, b.) for Group 2 



Extent of Slope 


t 






Heterogeneity 


Small 


Medium 


Large 


Group 1 


40, .5 


30, 1 


20, 2 


Group 2 


60, -.5 


70,-1 


100,-2 



Table 2 

n 

Simulation Results Under Slight Extent of Slope Heterogeneity 
R Obtained After a Significant Omnibus Test 



Size 


Omnibus Test 


Slope 
Homogeneity 


n 


Incorr 
R 


P 






X. 




Total 




















Rejections 


1259 




283 


214 










10: 10 


Total 




A < 0 


.74 




• 


* 




Incorr . R 9 s 


340 


































Percent 


.27 


A > 0 


971 


126 


.13 


13 


. 27 


26. 74 




Total 




















Re jactions 


2414 




271 


135 


.68 








20:20 


Total 
Incorr. R'o 


370 


A < 0 




• 


• 




Percent 


.16 


A > 0 


2143 


185 


.09 


14 


.07 


25.87 




Total 




















Rejections 


3622 




269 


iao 


.67 








30: 30 


Total 
Incorr. R'g 


398 - 


A < 0 




9 


♦ 




Percent 


.11 


A > 0 


335 3 


218 


.07 


14. 


07 


25. 72 



c 



The slope combination ia {.5, -.5). 



Table 3 

Simul ation Results Under Moderate Extent of slope Heterogeneity 
R Obtained After a Significant Omnibus Test 



Size 


Omnibus Test 


Slope 
Homogeneity 
Test 


Incorr . 
n R P X. X. 


10: 10 


Total 
Rejections 


3644 




303 150 .50 


Total 
Incorr. R's 


433 


A < 0 


A > 0 


3341 2S3 .09 14.51 24.51 


Percent 


.12 


20:20 


Total 
Rejections 


7261 


A < 0 


144 64 .44 


Total 
Incorr. R's 


469 


A > 0 


7117 405 .06 16.54 23.40 


Percent 


.06 




Total 
Rejections 


3986 


A < 0 


50 23 .46 


30: 30 


Total 
Incorr. R's 




524 ■ 


1 

i 


Percent j 


.06 


A > 0 


8936 501 . 06 17.56 22.42 



The slope combination is (1, -I). 
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Simulation Resuj.t3 Qnder Severe Extent of Slope Hetercgeneity 



R Obtained After a Significant Omnibus Test 



Size 


Omnibus Test 


Slope 
Homogeneity 
Test 


Incorr • 
n R P X. 2. 


10:10 


Total 
Rejections 


8345 




94 27 .29 


Total 
Incorr. R's 


52? 


A < 0 


A > 0 


8751 :00 .057 17.52 22*37 


Percent 


.06 


20:20 


Total 

Rejections 


9975 


A < 0 


0 0 0 • 


Total 
Incorr. R'a 


467 


A > 0 


9975 467 .047 18.85 21.14 


Percent 


.05 




Total 
Rejections 


10000 


A < 0 


0 0 0 • 


30:30 


Total 
Inco* r. R's 




491 


A > 0 


10000 491 . 049 19 . 15 20 .85 




Percent 


.049 



The slope combination is (2, -2). 
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Table 5 

Simulation Results Under Slight Extent of Slope Heterogene ity 
R Computed without An Omnibus Test 



Size 


Total 
Obtainable R 


Slope 
Homogeneity 
Test 


Incorr . 
n H P X. X. 


10: 10 


2603 


A < 0 


1069 397 .37 


A > 0 


1534 130 .12 7.96 35.25 


20: 20 


42 36 


A < 0 


1130 297 .263 


A > 0 


3106 170 .055 -10.31 32.88 


30: 30 


5647 


A < 0 


1122 265 .236 




A > 0 


4525 226 .050 5.U 30.77 
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Table 6 



Simulation Results Under Moderate Extent of Slope Heterogeneity 
R Computed Without An Oanibus Test 



Size 


Total 
Obtainable R 


Slope 
Homogeneity 
Test 


Incorr . 

n R P X. X, 


10:10 


5724 


A < 0 


1141 220 .193 


A > 0 


4583 30? . 067 4 . 64 35 . 45 


20:20 


8645 


A < 0 


546 77 . 141 


A > 0 


6099 390 .048 14.20 30,09 


30: 30 


9639 


A < 0 


184 31 .168 


A > 0 


9455 460 .049 16. 16 23.62 
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Table 7 

Simulation Results Under Severe Extent of Slope Heterogeneity 
R Computed Without An Omnibus Teat 



Size 


Total 
Obtainable R 


m Slope 
Homogeneity 
Test 


n 


Incorr. 

R P 






10:10 


9538 


A < 0 


268 


27 .101 




* 


A > 0 


9270 


500 .054 


16. 49 


22. 99 


20:20 


5995 


A < 0 


1 


0 0 








A > 0 


9994 


467 ,047 


13. 34 


21.15 


30: 30 


10000 


A < 0 


0 


0 0 


o 


9 




A > 0 


10000 


491 .049 


19. 15 


20.85 

. 1 



if; 



